专利摘要:
The invention relates to a method of assisting the monitoring of a conversation with a plurality of interlocutors for a hearing-impaired person equipped with computer equipment comprising steps of: - acquisition (E1, E2) of signals representative of an activity voice of a first interlocutor and characterizing a behavior of a second interlocutor in response to said voice activity of said first speaker, - determining (E3) a membership of the first and the second interlocutor to the same first group of conversation according to acquired signals, -selection (E4) of the first conversation group, -determination (E5), for said first conversation group, of a voice output mode for transmission (E6) of acquired voice signals for the interlocutors belonging to the first group of conversation or text for display (E7) of text signals obtained by conversion of acquired voice signals for interlocutors belonging to the first group d e conversation.
公开号:FR3047628A1
申请号:FR1650930
申请日:2016-02-05
公开日:2017-08-11
发明作者:Christophe Guedon
申请人:Christophe Guedon;
IPC主号:
专利说明:

GENERAL FIELD OF THE INVENTION The subject of the invention is the field of assistance to persons with hearing loss, and more specifically concerns a method of assisting the hearing-impaired person to follow a conversation with interlocutors participating in different conversations.
STATE OF THE ART
Hearing impaired people are usually equipped with hearing aids that amplify sounds to compensate for hearing difficulties.
Such prostheses are well adapted to following a conversation with a small number of people in a quiet environment, but a person with hearing loss may continue to have difficulty following a conversation, despite the use of such prostheses, in an environment where which many noises can be superimposed on the conversation. Such a situation may arise when the conversation takes place in a noisy environment, for example in the street or in a room in which many people speak simultaneously, such as a canteen.
Some existing hearing aid systems allow a hearing-impaired person to select interlocutors of his / her choice so that only the spoken words of these interlocutors are returned to him, in amplified or textual voice form, for example. The hard-of-hearing person can focus on an ongoing conversation and ignore comments from people in their environment who are involved in other conversations they do not want to follow.
Such systems nevertheless have the disadvantage of requiring manual selection of interlocutors to follow or to ignore. Such a defect can be particularly troublesome in the case where new interlocutors enter regularly the conversation, or on the contrary leaves it, or even in the case of interlocutors participating in several conversations at the same time.
If she wants to be able to continue to follow only one conversation and to understand what all the participants in this conversation are talking about, but only about what they are saying about this conversation, a hearing impaired person using an existing system will have to regularly correct the selection of interlocutors at follow in order to exclude the participants, even temporarily, to another conversation, and to include the incoming interlocutors, possibly again, in the conversation. Such a regular selection correction may be tedious due to the attention and effort required, and may even prevent the person with hearing loss from understanding and participating in the conversation.
There is therefore a need for a conversation aid method for a person with hearing loss to enable the person to effectively follow an ongoing conversation even when the participants in the conversation change regularly and / or participate in other conversations. without requiring the hearing-impaired person to pay much attention or effort to the implementation of said method.
PRESENTATION OF THE INVENTION
The present invention thus relates, according to a first aspect, to a method of assisting the monitoring of a conversation with a plurality of interlocutors, each of whom may belong to at least one conversation group, for a hearing-impaired person provided with a computer equipment, said computer equipment being connected to at least one voice transmission module, at least one display device, to at least one voice signal conversion module to text signals and to at least one detection system of voice activity, said method comprising the following steps implemented by said computer equipment: - acquisition of a signal representative of a voice activity of a first interlocutor among the plurality of interlocutors, using said detection system d vocal activity, - acquisition of a signal characterizing a behavior of a second interlocutor among the plurality of interlocutors in re acting on said voice activity of said first interlocutor, - determining a membership of the first and second interlocutors in the same conversation group, said first conversation group, according to the signals acquired for the first interlocutor and the second interlocutor, -selection the first conversation group among said conversation groups, -determining a voice or text delivery mode for said selected first conversation group, -when the speech reproduction mode has been determined for the first conversation group: transmission, at the voice signal voice transmission module acquired for the interlocutors belonging to the first selected conversation group, for transmitting said voice signals; when the textual reproduction mode has been determined for the first conversation group: control of the display, on the display device, of text signals obtained by conversion of acquired voice signals for the interlocutors belonging to the first selected conversation group; , using the signal conversion module.
In a first mode of implementation of the method according to the first aspect, said computer equipment being further connected to at least one face orientation detection system, and to a synchronization equipment configured to determine a correlation between relative signals. to different interlocutors: the step of acquiring a signal representative of a voice activity of a first interlocutor comprises a determination of a first binary variable representative of a voice activity of said first interlocutor, the step of acquisition of a signal characterizing a behavior of a second interlocutor comprises a determination of a second binary variable representative of a detection of a face orientation of the second interlocutor towards the first interlocutor using the detection system of orientation of face, - the step of determining a membership the first and second parties to the same group of conversation comprises determining a correlation using said synchronization equipment between said first and second variables.
In a second embodiment of the method according to the first aspect, said computer equipment being furthermore connected to at least one gaze orientation detection system, and to a synchronization equipment configured to determine a correlation between signals. relating to different interlocutors: the step of acquiring a signal representative of a voice activity of a first interlocutor comprises a determination of a first binary variable representative of a voice activity of said first interlocutor; for acquiring a signal characterizing a behavior of a second interlocutor comprises a determination of a third binary variable representative of a detection of an orientation of the gaze of the second interlocutor towards the first interlocutor by means of the orientation detection of gaze, - the step of determining an apartment The first and second parties to the same conversation group comprise determining a correlation using said synchronization equipment between said first and third variables.
In a third embodiment of the method according to the first aspect: the step of acquiring a signal characterizing a behavior of a second interlocutor comprises a determination using said voice activity detection system; a voice activity of the second party, the membership of the first and second parties to the same group of conversation is determined according to a delay between an end of voice activity of the first speaker and a beginning of voice activity of the second interlocutor.
In an implementation variant, the computer equipment of the hearing-impaired person being connected to a device for detecting movement of the hearing-impaired person, the steps of selecting a conversation group and / or determining a mode of restitution are implemented according to a movement of the hearing impaired person detected by the motion detection device.
Furthermore, a voice activity of an interlocutor can be determined by spectral analysis of first voice signals acquired for said interlocutor or by word recognition applied to said first voice signals or by detection of a labial movement of said interlocutor.
According to a second aspect, the invention relates to a computer program product comprising code instructions for executing a method according to the first aspect when this program is executed by a processor.
According to a third aspect, the invention relates to a computer equipment for assisting the follow-up of a conversation with a plurality of interlocutors, for a hearing-impaired person configured to be connected to at least one voice transmission module, to at least one display device and at least one voice to text signal conversion module and at least one voice activity detection system, said computer equipment being further configured to implement the steps of the method according to the first aspect.
According to a fourth aspect, the invention relates to a system for following up a conversation with a plurality of interlocutors for a hearing-impaired person comprising: computer equipment according to the third aspect and a face orientation detection system and / or gaze comprising at least one stereoscopic camera configured to detect the orientation of the face and / or look of an interlocutor.
According to a fifth aspect, the invention relates to a system for following up a conversation with a plurality of interlocutors for a hearing-impaired person comprising: computer equipment according to the third aspect and a face orientation detection system comprising at least one geolocation system and at least one gyrometer and / or accelerometer configured to determine the location of an interlocutor and the orientation of his face relative to other interlocutors.
PRESENTATION OF THE FIGURES Other features and advantages will appear on reading the following description of an embodiment. This description will be given with reference to the appended drawings in which: FIG. 1 illustrates an exemplary architecture for implementing the method according to one embodiment; FIG. 2 is a diagram schematizing an implementation of a method for following up a conversation according to the invention;
DETAILED DESCRIPTION
Referring to Figure 1, an implementation for the invention relates to a method of assisting the monitoring of a conversation with a plurality of interlocutors 1, for a hearing-impaired person 2 equipped with computer equipment 3. Each of these interlocutors are likely to belong to at least one conversation group.
This method is intended to automatically determine such conversation groups, each grouping several interlocutors participating in the same conversation. It is also intended to allow the hard of hearing person to select one or more of these groups so that conversations within these groups are returned to him, for example, voice or text. The automatic determination of such groups, and especially their updating, prevents the person with hearing loss from having to manually select the interlocutors whom he wishes to follow and to have to constantly update this selection according to the interlocutors entering or coming out of a conversation.
For this computer equipment 3 is likely to be connected to at least one voice transmitter module 4, such as a speaker or headphones, to at least one display device 5 such as a screen LCD, at least one voice to text signal conversion module 6, and at least one voice activity detection system 7. The computer equipment 3 may comprise a processing module 8 such as a microprocessor, and an input interface 9 allowing the hearing-impaired person 2 to make a selection or to enter an order. According to one embodiment, the input interface is included in the display device 5 in the form of a touch screen. The computer equipment can be connected to these elements by one or more wired communication interfaces (coaxial audio cable, SPDIF / Toslink interface, HDMI connection, VGA, Ethernet etc ...) or wireless (Bluetooth, Wifi ... ). Alternatively all or some of these elements may be included in the computer equipment itself. The computer equipment 3 may consist of a portable device such as a smartphone ("smartphone") or a tablet. Alternatively, the computer equipment 3 can be integrated into glasses in the form of glasses with augmented reality display. In this case, the information delivered to the hearing-impaired person 2 can be displayed on the glasses of the spectacles used as a display device 5. The hearing-impaired person can then see both the information displayed and his environment in transparency. According to another variant, the computer equipment 3 is shared between glasses used for display and carrying certain sensors, and a portable device comprising the processing unit, the input interface and the rest of the elements of the equipment. computer 3.
In an alternative embodiment, the display device may include a projector capable of projecting the information to the hearing-impaired person. Such a projection can be carried out conventionally on a projection medium, such as a wall or a screen, or even in 3D for example in the form of a hologram. Such a projection, when it is intended to display a text corresponding to the words of an interlocutor, may project this text to a location in the space aligned with the hearing-impaired person and this interlocutor, so as to be displayed in the field of vision of the person with hearing loss near this interlocutor.
To determine the membership of the interlocutors to one or more groups of conversation, the invention proposes, when speaking a first speaker, to analyze the behavior of other speakers to determine if they listen or s they participate in a conversation with this first interlocutor.
Voice activity detection system
The voice activity detection system 7 may be employed to determine a voice activity of the first caller, i.e. participation in a conversation. Such a voice activity detection system may include one or more microphones to pick up the voices of the interlocutors in the hearing-impaired person's environment. A microphone means a sound signal acquisition module, whatever the technology used to make such an acquisition. Such a microphone can be directional to reduce the pickup of unwanted speaker voices or noise, or bidirectional or omnidirectional in order to capture a whole soundscape. Such microphones can be installed in a fixed manner, for example in a conference room or restaurant facility. They can also be removable and installed by the hearing impaired person, see attached to it, for example on a garment, or included in the computer equipment itself. Alternatively, such microphones can be worn by each interlocutor, for example within a portable electronic device worn by each interlocutor.
The voice signals captured by such microphones may be spectrally analyzed, for example by the voice activity detection system 7 or the computer equipment processing module 8, to determine whether the signals analyzed are representative of a human voice, ie if an interlocutor is speaking.
Alternatively, voice recognition, ie, word recognition, may be applied to the acquired speech signals, in an attempt to recognize words among the sounds emitted by an interlocutor. When such a recognition is positive, the interlocutor having pronounced these words is then in the process of taking part in a conversation.
Alternatively, the voice activity detection system 7 may comprise one or more video cameras in order to detect a labial movement of the first interlocutor, a movement that is synonymous with speaking. The detection of a voice activity of the first interlocutor is then implemented by detecting a labial movement of this interlocutor.
Face orientation detection system
In a first embodiment, the invention proposes to detect the interlocutors of the first speaker who has spoken by analyzing the orientation of their face. When a person speaks, people listening to his remarks tend to naturally turn to it. The computer equipment is then connected to a face orientation detection system 10. By orientation of a face means the direction perpendicular to the plane of the face and passing through the center of it.
Such a face orientation detection system may include at least one camera, particularly a stereoscopic camera, to detect changes in face orientation of the speakers during the discussion. Such a system makes it possible to detect the face orientation of the interlocutors at a distance, in a non-intrusive manner, and by covering a large field, thus a large number of interlocutors with a single camera.
Alternatively, such a face orientation detection system may comprise at least one sensor configured to detect an angular variation of the face (gyrometer, accelerometer, ...), in particular at least one gyrometer and / or accelerometer mounted on each interlocutor so to capture their movements and to deduce the orientation of their faces. Such a solution has the advantage of being more precise because insensitive to environmental conditions such as ambient light, the presence of obstacles such as furniture or curtains, but requires to equip each interlocutor. According to another alternative, such a face orientation detection system may comprise a 3D face reconstruction means, such as a 3D scanner (LIDAR, RADAR, ultrasound, etc.).
In order to determine the exact location to which an interlocutor is oriented to determine if he is facing a speaking party, the face orientation detection system may also include at least one location system to determine the exact position of the interlocutors in the hearing-impaired person's environment. Such geolocation can be carried out absolutely using external geolocation devices such as GPS receivers. Such a geolocation can also be implemented using a local installation. For example, the interlocutors can wear equipment that can be positioned by triangulation with respect to the access points of a wireless local area network to which they are connected, such as a Wifi network. The geolocation of an equipment can also be performed relative to other objects whose absolute position is known. The position of these objects can for example be recorded in an object coordinate database and the equipment to be geolocated can determine its position with respect to these objects using a radar type detector. These objects can be connected objects that identify themselves through a wireless network. According to another variant, each element to be geolocated is equipped with a motion detection device which may for example comprise a three-axis accelerometer and / or a three-axis gyroscope and / or a gyroscope. During an initialization, the position of each element to be geolocated with respect to a reference point can be recorded. The evolution of the position of each element with respect to this reference point can then be followed by the detection of the movements of these elements by their motion detection device. Geolocation from a base of object coordinates or using a motion detection device can make it possible to obtain a location accuracy greater than that obtained by a GPS or Wifi location.
In order to determine if a second party listens to a first party speaking, the computer equipment is connected to a synchronization equipment 11 configured to determine a potential correlation between a signal representative of a voice activity of the first party and a representative signal. an orientation of the face of the second interlocutor towards the first interlocutor. Such a correlation indeed indicates that the second interlocutor turns his head towards the first interlocutor when he speaks, and so that these two interlocutors are engaged in the same conversation and must therefore belong to the same group of conversation. Such synchronization equipment may be an autonomous device with memory, processing means and a communication interface for communicating with the computer equipment. Alternatively, the synchronization equipment can be confused with the computer equipment and the processing module 8 thereof can be used to determine such a correlation.
Look orientation detection system
In a second embodiment, more precise than the first embodiment, the invention proposes to detect the interlocutors of the first speaker who took the floor by analyzing the orientation of their gaze. When a person speaks, people listening to his words, failing to turn their faces towards it, can at least look at it. Such a situation can particularly arise when a person has two interlocutors located opposite him at the same time in his field of vision. This person can then look alternately at each of his interlocutors, according to their words in the conversation, without changing the orientation of his face. The computer equipment is then connected to a look orientation detection system 12.
Such a face orientation detection system may comprise a system implementing an oculometry ("eye-tracking") method making it possible to follow the orientation of the gaze of the interlocutors, for example one or more cameras, particularly cameras. infrared, which can be used to implement such an oculometry method ("Eye-tracking"). Such cameras can be installed near the interlocutors, particularly in the context of a fixed installation such as that of a conference room, or even be worn by the interlocutors themselves, for example by being integrated into a helmet or glasses.
In this second embodiment, the gaze orientation detection system may also comprise at least one geolocation system similar to that described in the context of the first embodiment in order to determine the exact position of the interlocutors in the environment of the hearing-impaired person, to determine if a second interlocutor looks in the direction of a first interlocutor speaking.
In this second embodiment, in order to determine whether a second interlocutor listens to a first party speaking, the computer equipment is also connected to a synchronization equipment 11 identical to that described in the first embodiment, and configured to determine a potential correlation between a signal representative of a voice activity of the first speaker and a signal representative of a second-person's gaze orientation towards the first speaker. Such synchronization equipment may be an autonomous device with memory, processing means and a communication interface for communicating with the computer equipment. Alternatively, the synchronization equipment can be confused with the computer equipment and the processing module 8 thereof can be used to determine such a correlation.
Process
More specifically, the steps described below can be implemented by the computer equipment.
During a first acquisition step E1, the computer equipment acquires a signal representative of a voice activity of a first interlocutor among the plurality of interlocutors using the detection system. voice activity 7. This signal may be transmitted to it by the voice activity detection system described above.
During a second acquisition step E2, the computer equipment realizes the acquisition of a signal characterizing a behavior of a second interlocutor among the plurality of interlocutors in response to said voice activity of said first interlocutor.
During a group determination step E3, the computer equipment determines the membership of the first and the second interlocutor to the same conversation group, said first conversation group, according to the signals acquired for the first speaker and the second speaker. interlocutor. To do this, the computer equipment can determine that such signals are correlated.
First and second embodiments
In the first and second embodiments respectively based on the analysis of the orientation of the face or the gaze of the speakers, during the first step E1 acquisition of a signal representative of a vocal activity of a first interlocutor, the computer equipment can determine a first binary variable representative of a voice activity of said interlocutor. This binary variable may be determined by the computer equipment from the signal representative of a voice activity transmitted by the voice activity detection system 7 or transmitted by the voice activity detection system 7 directly. Such a variable can thus take the value 1 when the first interlocutor speaks, and 0 when he does not speak.
In the first and second implementation modes, during the second acquisition step E2 of a signal characterizing a behavior of a second interlocutor, the computer equipment can determine a second binary variable representative respectively of a detection of an orientation of the face / gaze of the second interlocutor towards the first interlocutor using the face / gaze orientation detection system. Such a detection can be implemented by checking whether a vector originating from the position of the second interlocutor and for direction the orientation of the face / gaze thereof is included in a cone of revolution whose axis is a line connecting the centers. faces of the first and second interlocutors, and having a predetermined radius in the plane of the first interlocutor. Such a ray is representative of a tolerable margin as to the orientation of a face / gaze to consider that it is well directed towards the first interlocutor, even if it is not exactly towards the center of the face of that person. this.
In the first and second modes of implementation, during the group determination step E3, the computer equipment determines a correlation using said synchronization equipment between said first and second variables. The synchronization equipment can calculate an inter-correlation function between the first and the second variable or a Pearson correlation coefficient relative to these two binary variables, and compare the result of such a calculation with a predetermined threshold. Such a threshold represents the minimum correlation level expected to consider that the movements of the face / gaze of the second interlocutor are correlated to the speeches of the first speaker. If this threshold is exceeded, the first and second interlocutors are then considered as participants in a common conversation and must then be part of the same conversation group.
The first and second embodiments may be used together to determine the affiliation of two interlocutors to the same group at a time from the orientation of the face and the gaze of the second party.
Optionally it is also possible to check the orientation of the face / look of the first person and consider that the first and second interlocutors should be part of the same group of conversation if the face / look of the first person is directed to the second party .
Third embodiment
In a third embodiment, the belonging of two interlocutors to the same conversation group can be achieved by analyzing the succession of speeches between these interlocutors. In the context of a conversation, interlocutors generally obey a code of conduct according to which the interlocutors do not cut the floor but speak rather quickly when a speaker stops speaking. It is therefore possible to determine that a second interlocutor responds to a first interlocutor when the second party speaks in a non-zero time but below a predefined threshold after the first speaker has finished speaking.
During the second acquisition step E2 of a signal characterizing a behavior of a second interlocutor, the computer equipment can then determine with the aid of said voice activity detection system a voice activity of the second interlocutor. The two acquisition steps E1 and E2 are then implemented in a similar way, respectively for the first and the second interlocutor, and similarly to the implementation of the first acquisition step in the context of the first or second second embodiment.
During the group determination step E3, the membership of the first and second parties to the same group of conversation is then determined according to a delay between an end of voice activity of the first speaker and a beginning of voice activity of the second party.
An interlocutor who is not in conversation with the first party may speak in a second conversation immediately after the first party has finished speaking in a first conversation. In order to prevent these two interlocutors from being mistakenly considered as part of the same group of conversation, it is possible to analyze the speakers' speeches for a relatively long enough time to cover several exchanges within a group. same conversation, and not to consider as part of the same group of conversation that the interlocutors having passed the speech most frequently, or a number of times higher than a predetermined threshold.
Fourth embodiment:
In a fourth embodiment, the belonging of two interlocutors to the same group of conversation can be achieved by analyzing the lexical field of each interlocutor. If two interlocutors belong to the same conversation group, their respective comments will have the same lexical field. They can then use the same discriminating words. The term "discriminating word" means a word belonging to a specific vocabulary, that is to say not being part of a vocabulary very common or common to speakers belonging to different groups of conversation. A calibration phase can be implemented to determine such a non-discriminating vocabulary that can be excluded when determining the lexical fields of interlocutors.
In this embodiment, the acquisition step E1 of a signal representative of a voice activity of a first interlocutor and the acquisition step E2 of a signal characterizing a behavior of a second interlocutor can comprise then a determination using a voice recognition system of discriminating words used by the interlocutor; and the determining step E3 of a membership of the first and second parties to the same conversation group may comprise a determination of the use of the same or similar discriminant words by these interlocutors.
This fourth embodiment can be used in conjunction with the other embodiments detailed above to determine the membership of two interlocutors in the same group.
This fourth embodiment is particularly interesting because it does not require a camera, such as the face and / or face orientation detection systems of the first and second embodiments, which are therefore sensitive to environmental conditions such as ambient light. . This mode also makes it possible to compensate for the visual failures of certain interlocutors since it does not require detection of face orientation and / or gaze. At the end of the determination step E3, if it is determined that the first and second interlocutors must be part of the same conversation group: if neither party is part of a conversation group, such a group is created and both parties are added, if one of the two parties is not part of any conversation group, it can be added to the conversation group of the other party. - If the first and second interlocutors are each part of a different newsgroup, these groups can be merged. Conversely, if it is determined that the second interlocutor no longer participates in a conversation with the first party, for example if he is no longer turned towards or no longer looking at the first party, or if he does not has not answered for a certain period of time, the second party can be removed from the group that he / she shared with the first party.
The steps described above may be repeatedly implemented and applied to all parties in the hearing-impaired person's environment, so as to regularly update the identified conversation groups.
There is a special case in which a speaker is part of two or more separate discussion groups and speaks alternately in several separate conversations. Such a situation may occur, for example, when an adult talking with other adults has to intervene at regular intervals with his / her children nearby but not participating in the conversation. The deaf person may then be interested only in what the speaker said in a single conversation, the conversation between adults in the example given above, and not wanting to hear what the speaker said in the context of the other conversation, for example the words addressed to the children.
When such an interlocutor speaks, the method may include a verification of the orientation of the face or the gaze of the speaker to determine which group the speaker is addressing. The speaker is then considered temporarily as belonging only to the discussion group to which he is addressing, so that his remarks are not reported to the other group. Thus, in the example given above, when the adult addresses his children by looking at them, he is considered to belong to only the discussion group formed with his children. Thus, his words to children will not be communicated to a hearing-impaired person in the adult conversation group.
In order to associate an interlocutor with a conversation group, that is to say with a particular conversation, the identity of this interlocutor must be memorized. If each interlocutor has a microphone, he can be identified by an identifier of the equipment making the acquisition of his remarks. Such an identifier can be arbitrary or correspond to a parameter such as an IP address. Alternatively, the computer equipment 3 can be connected to a database of voice signatures of interlocutors and the computer equipment of the hearing-impaired person 3 can include a voice recognition module. The interlocutor can be identified by memorizing a voice print. Thus, when an interlocutor whose voiceprint has been stored speaks again, his voice can be identified and the captured remarks can be attached to this interlocutor. Such a database of voice signatures can also associate with each voice signature an identity data of the corresponding interlocutor such as his name. Alternatively, the computer equipment can be connected to a camera and a facial recognition database. Each interlocutor can then be identified visually by comparing an image of the face of the interlocutor taken by the camera with reference images recorded in the facial recognition database. At the end of the determination step E3, the identified interlocutors who have spoken during a predetermined elapsed time can be displayed on the display device in the form of a list or an image with an indication of the conversation group or groups. which each interlocutor belongs to.
The hard-of-hearing person wishing to follow a conversation with a first group of conversation, can then in a group selection step E4 select this first group of conversation among the conversation groups thus determined.
The hearing-impaired person then determines whether he wishes all the different interventions of each of the interlocutors of this first group of conversations to be returned to him in the form of text or voices. To do this, it determines a voice or text reproduction mode for said first conversation group selected in a restitution mode determination step E5.
In an implementation variant, the computer equipment 3 of the hearing-impaired person being connected to a motion detection device 13 of the hearing-impaired person 2, the steps of selecting a conversation group and / or determining a restitution mode are implemented according to a movement of the hearing-impaired person detected by the motion detection device.
In this implementation variant, the computer equipment can be connected to a gesture recognition device configured to recognize certain gestures of the hearing-impaired person and to a database storing in correspondence with each of these gestures, possibly depending on the interlocutor, the selection or not of a group of conversation or the mode of restitution associated.
According to another variant of implementation, the computer equipment of the hearing-impaired person 3 may comprise a voice recognition module 14. A step of receiving a voice command from the hearing-impaired person 2 relating to a group or a mode of restitution to select according to said voice command can then be performed, the voice recognition module 14 is then configured to analyze said voice command. The hard-of-hearing person can thus interact vocally with the computer equipment 3 to control the selection of a group or a particular rendering mode. The recognition by the computer equipment 3 of a voice command issued by the hearing-impaired person may be effective only if the person with hearing loss has previously performed a particular action, such as making a particular movement or pressing an interface button, or as long as the person with hearing loss performs such an action.
Alternatively, the hard-of-hearing person can interact with the computer equipment 3 by thought. The computer equipment may for example be connected to a neural headset placed on the skull of the hearing-impaired person and configured to perform an electroencephalogram thereof. The user can thus submit to the computer equipment commands by thought, for example simple commands such as the selection of a particular group. The hearing impaired person can interact with his computer equipment without using his hands and without disturbing those around him as could a voice command.
Then, when the speech reproduction mode has been determined for the first conversation group: during a step of transmitting voice signals E6, the computer equipment transmits voice signals acquired for the interlocutors belonging to this first conversation group to voice transmitter module 4, for the latter to transmit these voice signals. The hard-of-hearing person selects the conversations that they want to hear in voice form.
In an implementation variant, the voice transmission modules are directional loudspeakers. Such loudspeakers may for example be arranged in the room in which the person with hearing loss and his interlocutors is held so as to form a mesh for issuing a sound directive to a person wherever it is located in the all of the local. This network of speakers can for example be fixed to the ceiling of the room. Since the position of each loudspeaker is known and the hearing-impaired person can be located as seen previously, the computer equipment can, when the speech reproduction mode has been determined, transmit the acquired voice signals to the most appropriate loudspeaker for transmitting the signals. voice signals to the hearing-impaired person. Such an implementation variant thus allows the hearing-impaired person not to have to wear headphones or headphones. This variant can also be implemented for well-hearing people to personalize their sound environment so as to bring them a better hearing comfort.
When the system for following up a conversation includes a plurality of geolocalisable microphones for capturing the voice signals, a variant can be implemented to limit the problems related to the failures of these microphones and to capture the voice signals using the most efficient microphone. This variant requires a prior calibration phase to determine the performance of each microphone of the plurality of microphones according to their distance and their orientation with respect to the speaker. These performances can be determined by sending words by a speaker located at different positions relative to the microphone, applying a word recognition algorithm to the voice signals captured by the microphone, then calculating a ratio of recognized words among the words issued for each of these positions. Then, in the operating phase, depending on the orientation of the face and / or the location of the interlocutor, and the position and orientation of the microphones, the most powerful microphone is selected to capture the words of this interlocutor. The selection of a microphone can also take into account a state of this microphone, for example a battery level.
On the other hand, when the textual reproduction mode has been determined for the first conversation group: the acquired voice signals for the interlocutors belonging to the first conversation group are converted into text signals by the signal conversion module 6 and transmitted to the computer equipment which then controls their display on the display device 5 during a display control step E7. The hard-of-hearing person selects the conversations for which she wishes to have a transcript in text form so that she can follow them in the background without hindering her understanding of the spoken conversation.
When each party has his own microphone, each party can also have a voice to text signal conversion module. The method may implement a step of capturing the voice signals of each party using his microphone, a step of converting the captured voice signals into text signals by his conversion module, and a step of transmission to the equipment. computer for the hearing-impaired person of the captured voice signals or text signals obtained according to the selected playback mode.
When the amount of text signals to be displayed on the display device is large, the computer equipment can replace all or part of the text to be displayed by a summary thereof. The text can be summarized by simple sentences or even by isolated words expressing concepts. Such text modification can be triggered manually by the user of the computer equipment, for one or more of his interlocutors, or automatically when the amount of text to display exceeds a predetermined threshold. Displaying a summary text can reduce the fatigue of the person with hearing loss in quick conversations or with many people in whom it may be difficult to read the entire speech over a long period.
In one embodiment the hard of hearing person can select multiple conversation groups at the same time. It can select for each of these groups, independently of the others, the mode of vocal restitution or the mode of textual restitution and thus follow in parallel several conversations, some under voice form, others in textual form. While the purpose of the method is to automatically select interlocutors through talkgroups, the hard-of-hearing person can manually fine-tune this selection by manually adding an interlocutor to a conversation group, or by removing it, or by requesting a restitution of the words of one or several interlocutors according to their conversation groups.
In some situations, the hearing-impaired person may be involved in a conversation with multiple interlocutors who can not all be in their field of view simultaneously. This is the case, for example, if the person with hearing loss is having a conversation at the table with interlocutors to his right and left. Hearing impaired people often compensate for their hearing problems with visual information, such as facial expressions, so it is essential for a person with hearing loss to be able to see his / her interlocutors and be able to record their emotions. The computer equipment 3 can be connected to at least one camera capturing an image of one or more interlocutors of the hearing-impaired person. Following the selection of a conversation group, the computer equipment 3 can display an image of the interlocutors belonging to this conversation group filmed by at least one camera and located outside the field of vision of the hearing-impaired person in a thumbnail inserted into the image displayed on said display device 5. The hearing-impaired person can thus perceive the visual information emanating from these interlocutors although they are not in his field of vision. In order to display only the image of the face of these interlocutors, the computer equipment 3 may comprise an image recognition or image processing system isolating the face of the interlocutors in the images captured by the at least one camera. Such a display can be updated dynamically to detect interlocutors integrating or leaving the field of vision of the hearing-impaired person according to the movements thereof and thereof. The images of the interlocutors displayed on the display device 5 can be updated according to this detection so that the person with hearing loss can permanently visualize an image of all these interlocutors.
In an implementation variant, the computer equipment of the hearing-impaired person being connected to a database comprising multiple expressions of each of the interlocutors and to a face expression change detection device, the method may show that the face of the interlocutors of the hearing impaired person out of his field of vision and changing face expression. The detection of such a change of expression may be accompanied by the issuance of an alarm, for example sound, in order to draw the hearing-impaired person's attention to this event. The hard-of-hearing person may also be warned of such detection by vibration. Such a vibration can be emitted by a vibrator connected among a plurality of such vibrators placed in contact with the skin of the hearing-impaired person, for example integrated into his clothes. The computer equipment may choose the vibrator to vibrate so as to trigger the vibration of the vibrator positioned on the body of the hearing-impaired person in the direction of the interlocutor having changed expression. For example if this person is positioned on the left of the hearing-impaired person, the computer equipment may vibrate a vibrator positioned on the left arm or the left flank of the hearing-impaired person. This one is thus informed of the direction in which to turn to see the person having changed expression.
The method may further include an initial calibration step during which interlocutors can be registered in the system. Such a recording may consist in memorizing an identifier of the interlocutor, possibly his name, accompanied by a voice print and / or a face imprint such as a 3D image or a photograph. Such a recording may also include a series of photographs of the face of the interlocutor when it forms various angles with respect to a reference direction. Such a face imprint can then be employed by the face orientation detection system to determine the orientation of the face of the interlocutor relative to a direction. The computer equipment of the hearing impaired person 3 may be connected to a conversation database storing data relating to past conversations followed by the hearing impaired person in a voice or textual manner. Such a database may for example memorize for these conversations the identity of the identified interlocutors, the selection choices made by the hearing-impaired person, data of location of the conversation ... From these data, in a given situation where the hearing-impaired person is surrounded by various interlocutors, the computer equipment 3 can automatically preselect a conversation group including the interlocutors most often selected in the past by the hearing-impaired person. The computer equipment 3 can also automatically trigger the vocal reproduction, respectively the display in text mode, of the speakers present whose group has most often been selected in the past by the hearing-impaired person for voice or textual reproduction respectively. This makes it possible to automatically offer the person with hearing loss the most interesting conversations for them. The selection of at least one first group for a vocal reproduction and / or a second group for a textual reproduction is thus performed according to preferences and / or uses of the hearing-impaired person. The update at the end of each conversation of the conversation database allows the computer equipment 3 to react in a personalized manner by learning based on an analysis of the behavior of the person with hearing loss. The computer equipment 3 may be provided with a conversation recording function and may provide the hearing-impaired person with a visualization of the recorded conversation history. The hard-of-hearing person may interact with the computer equipment 3 to browse this history data using interface buttons or through a touch screen. In the case of a 3D display or augmented reality, the acquired voice and text signals can be staggered in the depth of the display according to their acquisition date. In the case of a large number of interlocutors, such a display, combining a display according to the depth to display the history of the remarks and a display at different viewing angles to differentiate the interlocutors, ensures better visibility of the conversations reproduced .
The hard-of-hearing person can effectively follow at least one ongoing conversation even when the participants in these conversations change regularly and / or participate in other conversations, without having to pay attention or significant effort.
权利要求:
Claims (10)
[1" id="c-fr-0001]
1. A method of assisting in following a conversation with a plurality of interlocutors (1), each of whom may belong to at least one conversation group, for a hearing-impaired person (2) equipped with computer equipment (3). ), said computer equipment being connected to at least one voice transmission module (4), at least one display device (5), to at least one conversion module (6) of voice signals into text signals and to at least one voice activity detection system (7), said method comprising the following steps implemented by said computer equipment: - acquisition (E1) of a signal representative of a voice activity of a first interlocutor among the plurality of interlocutors, using said voice activity detection system (7), - acquisition (E2) of a signal characterizing a behavior of a second interlocutor among the plurality of interlocutors in response to said activity voice of said first interlocutor, - determination (E3) of a membership of the first and the second interlocutor to the same conversation group, said first conversation group, according to the signals acquired for the first interlocutor and the second interlocutor, -selection ( E4) of said first talkgroup among said talkgroups, -determining (E5) a voice or text delivery mode for said selected first talkgroup, -when voice output mode has been determined for the first talkgroup. conversation: transmission (E6) to the voice transmission module (4) of acquired voice signals for the interlocutors belonging to the first selected conversation group, for transmitting said voice signals; when the textual reproduction mode has been determined for the first conversation group: control of the display (E7), on the display device, of text signals obtained by conversion of acquired voice signals for the interlocutors belonging to the first group; selected conversation, using the signal conversion module (6).
[2" id="c-fr-0002]
2. A method of assisting a conversation tracking according to the preceding claim, said computer equipment being further connected to at least one face orientation detection system (10), and to a synchronization equipment (11) configured to determine a correlation between signals relating to different interlocutors, in which: the acquisition step (E1) of a signal representative of a voice activity of a first interlocutor comprises a determination of a first representative binary variable a voice activity of said first interlocutor, - the acquisition step (E2) of a signal characterizing a behavior of a second interlocutor comprises a determination of a second binary variable representative of a detection of an orientation of the face of the second party in the direction of the first party using the face orientation detection system, - the step of determining (E3) of membership of the first and second party to a same conversation group comprises determining a correlation with said synchronization equipment between said first and second variables.
[3" id="c-fr-0003]
A method of assisting a conversation follow-up according to claim 1 or 2, said computer equipment being furthermore connected to at least one look orientation detection system (12) and to a synchronization equipment ( 11) configured to determine a correlation between signals relating to different interlocutors, wherein: the acquisition step (E1) of a signal representative of a voice activity of a first interlocutor comprises a determination of a first binary variable representative of a voice activity of said first interlocutor, - the acquisition step (E2) of a signal characterizing a behavior of a second interlocutor comprises a determination of a third binary variable representative of a detection of an orientation of the gaze of the second interlocutor towards the first interlocutor using the glance detection system, - the stag e determination (E3) of a membership of the first and second interlocutor to the same group of conversation comprises a determination of a correlation using said synchronization equipment between said first and third variables.
[4" id="c-fr-0004]
4. A method for following a conversation according to claim 1, wherein: the acquisition step (E2) of a signal characterizing a behavior of a second interlocutor comprises a determination using said system for detecting the voice activity of a voice activity of the second party, - the membership (E3) of the first and second parties to the same group of conversation is determined according to a delay between an end of voice activity of the first interlocutor and a beginning of voice activity of the second interlocutor.
[5" id="c-fr-0005]
5. Method according to one of the preceding claims, wherein the computer equipment (3) of the hearing-impaired person being connected to a device for detecting movements (13) of the hearing-impaired person (2), the selection steps of a group of conversation and / or determination of a mode of restitution are implemented according to a movement of the hearing-impaired person detected by the motion detection device.
[6" id="c-fr-0006]
6. Method according to one of the preceding claims, wherein a voice activity of a speaker is determined by spectral analysis of first voice signals acquired for said interlocutor or by word recognition applied to said first voice signals or by detection of a movement labial of said interlocutor.
[7" id="c-fr-0007]
A computer program product comprising code instructions for executing a method as claimed in any one of the preceding claims when the program is executed by a processor.
[8" id="c-fr-0008]
8. Computer equipment (3), for help in following up a conversation with a plurality of interlocutors (1), for a hearing-impaired person (2) configured to be connected to at least one voice transmission module ( 4), at least one display device (5) and at least one voice to text signal conversion module (6) and at least one voice activity detection system (7), said computer equipment (3) further being configured to carry out the steps of the method according to any one of claims 1 to 6.
[9" id="c-fr-0009]
9. System (14) for monitoring a conversation with a plurality of interlocutors for a hearing-impaired person (2) comprising: computer equipment (3) according to the preceding claim, and an orientation detection system of face (10) and / or look (12) comprising at least one stereoscopic camera configured to detect the orientation of the face and / or look of a speaker.
[10" id="c-fr-0010]
A system (14) for following up a conversation with a plurality of interlocutors for a hearing-impaired person (2) comprising: computer equipment (3) according to claim 8, and an orientation detection system of face (10) comprising at least one geolocation system and at least one gyrometer and / or accelerometer configured to determine the location of an interlocutor and the orientation of his face relative to other interlocutors.
类似技术:
公开号 | 公开日 | 专利标题
US10970037B2|2021-04-06|System and method for differentially locating and modifying audio sources
EP3412036B1|2020-03-18|Method for assisting a hearing-impaired person in following a conversation
EP3081011B1|2019-09-18|Name-sensitive listening device
US20190220933A1|2019-07-18|Presence Granularity with Augmented Reality
US9949056B2|2018-04-17|Method and apparatus for presenting to a user of a wearable apparatus additional information related to an audio scene
WO2019013849A1|2019-01-17|Providing an ambient assist mode for computing devices
US20150172830A1|2015-06-18|Method of Audio Signal Processing and Hearing Aid System for Implementing the Same
US9263044B1|2016-02-16|Noise reduction based on mouth area movement recognition
TW201535156A|2015-09-16|Performing actions associated with individual presence
WO2014120291A1|2014-08-07|System and method for improving voice communication over a network
US20170303052A1|2017-10-19|Wearable auditory feedback device
TW201543902A|2015-11-16|Muting a videoconference
EP3149968B1|2018-08-15|Method for assisting with following a conversation for a hearing-impaired person
JP2009060220A|2009-03-19|Communication system and communication program
JP4585380B2|2010-11-24|Next speaker detection method, apparatus, and program
Donley et al.2021|EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments
US20210174823A1|2021-06-10|System for and Method of Converting Spoken Words and Audio Cues into Spatially Accurate Caption Text for Augmented Reality Glasses
US11164341B2|2021-11-02|Identifying objects of interest in augmented reality
US20220066207A1|2022-03-03|Method and head-mounted unit for assisting a user
US11163522B2|2021-11-02|Fine grain haptic wearable device
WO2019198405A1|2019-10-17|Information processing device, information processing system, information processing method and program
EP3855348A1|2021-07-28|Error management
JP2021533510A|2021-12-02|Interaction method and equipment
WO2021154402A1|2021-08-05|Extracting information about people from sensor signals
同族专利:
公开号 | 公开日
EP3412036B1|2020-03-18|
FR3047628B1|2018-05-25|
EP3412036A1|2018-12-12|
ES2795016T3|2020-11-20|
WO2017134300A1|2017-08-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
US6593956B1|1998-05-15|2003-07-15|Polycom, Inc.|Locating an audio source|
US7117157B1|1999-03-26|2006-10-03|Canon Kabushiki Kaisha|Processing apparatus for determining which person in a group is speaking|
US20020103649A1|2001-01-31|2002-08-01|International Business Machines Corporation|Wearable display system with indicators of speakers|
US20040172255A1|2003-02-28|2004-09-02|Palo Alto Research Center Incorporated|Methods, apparatus, and products for automatically managing conversational floors in computer-mediated communications|
US20130144622A1|2010-09-28|2013-06-06|Maki Yamada|Speech processing device and speech processing method|
US20130300648A1|2012-05-11|2013-11-14|Qualcomm Incorporated|Audio user interaction recognition and application interface|
US8128342B2|2008-10-09|2012-03-06|Manufacturing Resources International, Inc.|Multidirectional multisound information system|
US8441515B2|2009-09-17|2013-05-14|Sony Corporation|Method and apparatus for minimizing acoustic echo in video conferencing|
US9532153B2|2012-08-29|2016-12-27|Bang & Olufsen A/S|Method and a system of providing information to a user|US11264029B2|2019-01-05|2022-03-01|Starkey Laboratories, Inc.|Local artificial intelligence assistant system with ear-wearable device|
US11264035B2|2019-01-05|2022-03-01|Starkey Laboratories, Inc.|Audio signal processing for automatic transcription using ear-wearable device|
EP3716650A1|2019-03-28|2020-09-30|Sonova AG|Grouping of hearing device users based on spatial sensor input|
EP3793210A1|2019-09-11|2021-03-17|Oticon A/s|A hearing device comprising a noise reduction system|
法律状态:
2017-02-09| PLFP| Fee payment|Year of fee payment: 2 |
2017-08-11| PLSC| Publication of the preliminary search report|Effective date: 20170811 |
2018-02-16| PLFP| Fee payment|Year of fee payment: 3 |
2020-02-07| PLFP| Fee payment|Year of fee payment: 5 |
2021-11-12| ST| Notification of lapse|Effective date: 20211005 |
优先权:
申请号 | 申请日 | 专利标题
FR1650930A|FR3047628B1|2016-02-05|2016-02-05|METHOD FOR MONITORING CONVERSATION FOR A MISSING PERSON|
FR1650930|2016-02-05|FR1650930A| FR3047628B1|2016-02-05|2016-02-05|METHOD FOR MONITORING CONVERSATION FOR A MISSING PERSON|
ES17703733T| ES2795016T3|2016-02-05|2017-02-06|Assistance procedure in following a conversation for a person with hearing problems|
PCT/EP2017/052533| WO2017134300A1|2016-02-05|2017-02-06|Method for assisting a hearing-impaired person in following a conversation|
EP17703733.0A| EP3412036B1|2016-02-05|2017-02-06|Method for assisting a hearing-impaired person in following a conversation|
[返回顶部]